326 research outputs found

    Energy Efficient Hardware Design of Neural Networks

    Get PDF
    abstract: Hardware implementation of deep neural networks is earning significant importance nowadays. Deep neural networks are mathematical models that use learning algorithms inspired by the brain. Numerous deep learning algorithms such as multi-layer perceptrons (MLP) have demonstrated human-level recognition accuracy in image and speech classification tasks. Multiple layers of processing elements called neurons with several connections between them called synapses are used to build these networks. Hence, it involves operations that exhibit a high level of parallelism making it computationally and memory intensive. Constrained by computing resources and memory, most of the applications require a neural network which utilizes less energy. Energy efficient implementation of these computationally intense algorithms on neuromorphic hardware demands a lot of architectural optimizations. One of these optimizations would be the reduction in the network size using compression and several studies investigated compression by introducing element-wise or row-/column-/block-wise sparsity via pruning and regularization. Additionally, numerous recent works have concentrated on reducing the precision of activations and weights with some reducing to a single bit. However, combining various sparsity structures with binarized or very-low-precision (2-3 bit) neural networks have not been comprehensively explored. Output activations in these deep neural network algorithms are habitually non-binary making it difficult to exploit sparsity. On the other hand, biologically realistic models like spiking neural networks (SNN) closely mimic the operations in biological nervous systems and explore new avenues for brain-like cognitive computing. These networks deal with binary spikes, and they can exploit the input-dependent sparsity or redundancy to dynamically scale the amount of computation in turn leading to energy-efficient hardware implementation. This work discusses configurable spiking neuromorphic architecture that supports multiple hidden layers exploiting hardware reuse. It also presents design techniques for minimum-area/-energy DNN hardware with minimal degradation in accuracy. Area, performance and energy results of these DNN and SNN hardware is reported for the MNIST dataset. The Neuromorphic hardware designed for SNN algorithm in 28nm CMOS demonstrates high classification accuracy (>98% on MNIST) and low energy (51.4 - 773 (nJ) per classification). The optimized DNN hardware designed in 40nm CMOS that combines 8X structured compression and 3-bit weight precision showed 98.4% accuracy at 33 (nJ) per classification.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Energy Modeling of Machine Learning Algorithms on General Purpose Hardware

    Get PDF
    abstract: Articial Neural Network(ANN) has become a for-bearer in the field of Articial Intel- ligence. The innovations in ANN has led to ground breaking technological advances like self-driving vehicles,medical diagnosis,speech Processing,personal assistants and many more. These were inspired by evolution and working of our brains. Similar to how our brain evolved using a combination of epigenetics and live stimulus,ANN require training to learn patterns.The training usually requires a lot of computation and memory accesses. To realize these systems in real embedded hardware many Energy/Power/Performance issues needs to be solved. The purpose of this research is to focus on methods to study data movement requirement for generic Neural Net- work along with the energy associated with it and suggest some ways to improve the design.Many methods have suggested ways to optimize using mix of computation and data movement solutions without affecting task accuracy. But these methods lack a computation model to calculate the energy and depend on mere back of the envelope calculation. We realized that there is a need for a generic quantitative analysis for memory access energy which helps in better architectural exploration. We show that the present architectural tools are either incompatible or too slow and we need a better analytical method to estimate data movement energy. We also propose a simplistic yet effective approach that is robust and expandable by users to support various systems.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Algorithm and Hardware Design of Discrete-Time Spiking Neural Networks Based on Back Propagation with Binary Activations

    Full text link
    We present a new back propagation based training algorithm for discrete-time spiking neural networks (SNN). Inspired by recent deep learning algorithms on binarized neural networks, binary activation with a straight-through gradient estimator is used to model the leaky integrate-fire spiking neuron, overcoming the difficulty in training SNNs using back propagation. Two SNN training algorithms are proposed: (1) SNN with discontinuous integration, which is suitable for rate-coded input spikes, and (2) SNN with continuous integration, which is more general and can handle input spikes with temporal information. Neuromorphic hardware designed in 40nm CMOS exploits the spike sparsity and demonstrates high classification accuracy (>98% on MNIST) and low energy (48.4-773 nJ/image).Comment: 2017 IEEE Biomedical Circuits and Systems (BioCAS

    Reconfigurable Turbo/Viterbi Channel Decoder in the Coarse-Grained Montium Architecture

    Get PDF
    Mobile wireless communication systems become multi-mode systems. These future mobile systems employ multiple wireless communication standards, which are different by means of algorithms that are used to implement the baseband processing and the channel decoding. Efficient implementation of multiple wireless standards in mobile terminals requires energy-efficient and flexible hardware. We propose to implement both the baseband processing and channel decoding in a heterogeneous reconfigurable system-on-chip. The system-on-chip contains many processing elements of different granularities, which includes our coarse-grained reconfigurable MONTIUM architecture. We already showed the feasibility to implement the baseband processing of OFDM and WCDMA based communication systems in the MONTIUM. In this paper we implemented two kinds of channel decoders in the same MONTIUM architecture: Viterbi and Turbo decoding
    • …
    corecore